Fix: Enforce strict 4-digit limit on JSON Unicode escapes to prevent greedy parsing #3116

BGamboa13 · 2025-11-21T19:12:00Z

Summary of Changes

This PR fixes a logic issue in AbstractJsonLexer where Unicode escape sequences (\uXXXX) could be parsed greedily. Previously, if a valid 4-digit unicode sequence was followed immediately by a character in the range [a-fA-F0-9], the lexer would incorrectly attempt to consume it as part of the hex sequence, leading to corruption or parsing errors.

Example of failure: Input: "\u00f3n" (intended: "ón")
Previous behavior: Parsed correctly.
Input: "\u00f3a" (intended: "óa")
Previous behavior: The lexer aggressively consumed 'a' as a 5th hex digit.

Technical Details

RFC 8259 Compliance: The parser now enforces a strict limit of exactly 4 hex digits following \u, as required by the JSON specification.
Performance Optimization: - Replaced the previous character conversion logic with a precomputed static HEX_TABLE.
- This allows for O(1) lookups and removes branching/conditional overhead inside the loop.
Safety: Added an explicit bounds check (currentPosition + 4 >= source.length) to prevent IndexOutOfBoundsException on malformed inputs ending abruptly.

Tests

Added testUnicodeEscapeWithFollowingHex to verify that \u00f3a is correctly parsed as the character ó followed by the character a.
Verified existing tests pass with the new optimized lookup table.

…greedy parsing The JSON lexer was incorrectly parsing unicode escapes by consuming more characters than the standard 4 hex digits specified in RFC 8259. This could lead to incorrect parsing when valid hex characters followed a unicode escape sequence. Changes: - Refactored appendHex() to strictly parse exactly 4 hex digits - Replaced fromHexChar() with a fast O(1) lookup table (HEX_TABLE) - Added explicit validation that fails on invalid hex digits - Added test case to verify correct parsing of \u00f3a as 'óa'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Enforce strict 4-digit limit on JSON Unicode escapes to prevent greedy parsing #3116

Fix: Enforce strict 4-digit limit on JSON Unicode escapes to prevent greedy parsing #3116

BGamboa13 commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix: Enforce strict 4-digit limit on JSON Unicode escapes to prevent greedy parsing #3116

Are you sure you want to change the base?

Fix: Enforce strict 4-digit limit on JSON Unicode escapes to prevent greedy parsing #3116

Conversation

BGamboa13 commented Nov 21, 2025

Summary of Changes

Technical Details

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant